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Abstract. 

We consider the reconstruction of a two-dimensional discrete image from a set of 
tomographic measurements corresponding to the Radon projection. Assuming that 
the image has a structure where neighbouring pixels have a larger probability to 
take the same value, we follow a Bayesian approach and introduce a fast message- 
passing reconstruction algorithm based on belief propagation. For numerical results, 
we specialize to the case of binary tomography. We test the algorithm on binary 
synthetic images with different length scales and compare our results against a more 
usual convex optimization approach. We investigate the reconstruction error as a 
function of the number of tomographic measurements, corresponding to the number 
of projection angles. The belief propagation algorithm turns out to be more efficient 
than the convex-optimization algorithm, both in terms of recovery bounds for noise- 
free projections, and in terms of reconstruction quality when Gaussian noise is added 
to the projections. 
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1. Introduction 

X-ray computed tomography (CT) P,[2] is a classical 3-D imaging technique in materials 
science or for medical applications. The X-ray beam transmitted through the sample 
is recorded on a planar detector, for different angles of incidence of the beam on the 
sample. Using the Beer-Lambert law for photons absorption, the reconstruction of the 
absorption x of the sample from the measurements y is a linear problem 

y = Fx + w, (1) 

with F the tomography projection matrix and w the measure noise. For a parallel X-ray 
beam, each pixel line of the detector measures the total absorption integrated along a 
horizontal light-ray through the sample. For a detector line of L pixels, a square image 
oi N = L"^ pixels has to be reconstructed from M = L x uq measurements, where rig is 
the number of angles, also called number of projections. 

Classical reconstruction algorithms such as filtered back-projection (FBP) [31 IH [1] 
or the algebraic reconstruction technique (ART) [5] require the linear system ([1]) to be 
sufficiently determined. Therefore, the number of angles ug needed is generally close 
to the number of pixels along a line, L. Nevertheless, many applications would benefit 
from being able to reconstruct from a smaller number of angles. In medical imaging 
for example, reducing the total X-ray dose absorbed by the patient is highly desirable. 
For in-situ tomography where the sample is evolving while it is imaged [6l [7] , reducing 
the number of angles increases the acquisition speed, hence the time-resolution of the 
observations. 

The traditional algorithms mentioned above fare poorly for a limited number of 
angles, and algorithms incorporating prior information on the sample are needed to 
improve the quality of the reconstruction. For example, samples with a discrete number 
of phases and absorption values are frequently encountered in materials science. Using 
this information can make up for the lack of measurements and result in a satisfying 
reconstruction. The field of discrete tomography [8] encompasses the large class of 
algorithms that have been proposed to solve this problem. Various approaches have been 
proposed, from heuristic methods [9] to the approximate resolution of the combinatorial 
problem [10], or of its convex relaxation [TTj . 

In this article, we propose a new algorithm for discrete tomography, that estimates 
the marginal probability for every pixel value. We use a Bayesian approach in order to 
write the a-posteriori probability distribution of discrete images, and a fast approximate 
message passing scheme relying on belief propagation to find the marginal probabilities. 
Within this framework, constraints imposed by the measurements are satisfied by 
iteratively solving a graphical model along the light-rays crossing the sample, known 
as the Potts model in statistical physics. For the specific case of binary images, this 
amounts to solving 1-D Ising models for all light-rays and measurements. 

We start this article with a short review of the state of the art on discrete 
tomography, where we introduce the convex algorithm based on total variation, used 
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for comparison with our algorithm. Then we move on to the presentation of our behef- 
propagation tomographic reconstruction algorithm (BP-tomo) and its relation to the 
field of statistical physics. Numerical results on synthetic data are presented for the case 
of binary images. We establish empirically the recovery phase diagram by decreasing the 
number of projections and determining the undersampling rate separating successful and 
failed reconstruction; it is found that the transition corresponds to an undersampling 
rate of the order of the density of gradients in the image. This bound on undersampling 
rates is a significant improvement over performances of traditional compressed sensing. 
In the realistic case of noisy measurements, reconstructions obtained with BP-tomo are 
compared with results obtained with a constrained total-variation minimization. For 
moderate noise, BP-tomo is found to give a much more accurate reconstruction above 
the critical undersampling rate of the noise-free case. 
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2. State of the art in discrete tomography 

Discrete tomography algorithms perform at the same time the tomographic 
reconstruction and the segmentation of the reconstructed image into objects with a 
finite number of known absorption values. In addition to the constraints on the discrete 
pixel levels, that strongly restrict the space of possible images, additional constraints 
on the gradients of the image are often enforced in order to obtain smooth images. A 
large range of methods has been proposed for discrete tomography, for a review see [8] . 
Here, we only cite a few representative examples, in order to discuss briefiy their scope 
of application and performance. 

Theoretical studies [121 E] have addressed the problem of the minimal number of 
angles needed to reconstruct a binary image, in the noise-free case. An important result 
of geometric tomography [12] is that four different projections are enough to reconstruct 
uniquely a uniform convex object [13]. For more complicated images, degenerate binary 
solutions may exist for the same set of projections, but it is possible to bound the 
distance between two solutions using the projection error of the thresholded ART 
solution [H]. It is also possible [15] to determine the set of pixels uniquely determined 
(in the absence of noise) by their projections. 

As for reconstruction algorithms, a first class of methods [TUl HSl [T71 [HI [19] adopts a 
Bayesian approach in order to make the most of the available information on the sample. 
The a posteriori probability distribution of a discrete image x given measurements y is 



with -Po(x) the a priori probability distribution on the space of images. In the Maximum 
A Posteriori (MAP) setting, the image maximizing the posterior distribution is searched 
for: 



Finding xmap is an NP-hard combinatorial problem. An MCMC (Monte-Carlo Markov 
Chain) algorithm is used in [101 [ISl [13 [TB| in order to sample the Gibbs distribution 
and to obtain an approximation of xuap- Gibbs sampling being very costly in terms 
of numbers of iterations, such algorithms are limited to small images. Therefore, these 
methods are more suitable for techniques with lower spatial resolution than computed 
tomography, such as positron emission tomography (PET) for example. For faster 
computations, a variational Bayes approach was used in [19]. The Bayesian approach is 
very flexible in terms of prior distribution on the images, allowing for a tuned weighting 
of local binary patterns in the image [16]. For binary images, a popular model for 
smooth images is the Ising model from statistical physics, that we shall use as well in 
our belief-propagation algorithm. 

A second class of algorithms [20l [211 [22| [23| [23] consider relaxations of the MAP 
estimation falling into the scope of linear optimization. In this setting, the searched-for 
image x is found by solving a set of linear constraints, such as the constraint y = Fx, 




(2) 



a^MAP = argmax 




(3) 
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or an interval constraint. For mixtures of convex and concave constraints, methods 
from DC (Difference of Convex functions) programming are used to converge to a local 
minimum of the objective function [221 EHl [231 Ell- 

Since the seminal papers by Candes, Romberg, Tao and Donoho [261 [271 [28] , the field 
of compressed sensing has breathed a new lease of life into the use of convex optimization 
for finding sparse solutions to underdetermined inverse problems. Compressed sensing 
proposes to find sparse solutions using a regularization with the convex ii norm, instead 
of the io norm measuring sparsity: problems can then be solved using classical techniques 
of convex optimization [29l [30] . In general, convex optimization methods can be 
computationally less intensive than other types of methods, hence more suitable for 
large images. Candes et al. proved under certain hypotheses that perfect reconstruction 
can be obtained with the ii norm in the noiseless case, and that the error is at most 
proportional to the noise level. These solid mathematical foundations have contributed 
greatly to the success of compressed sensing. Although tomographic projection does 
not satisfy the restrictive hypotheses of the mathematical proofs, compressed sensing 
has proven to work well for tomographic reconstruction. In fact, the founding paper of 
compressed sensing [26] demonstrated perfect reconstruction with ii regularization using 
the example of incomplete tomographic projections of the piecewise-constant Shepp- 
Logan phantom [3]. For the piecewise-constant images of discrete tomography, the 
regularization is on the £i norm of the gradient of the image, that is its total variation 
(TV) 

N 

TV(x) = ^|Vx,| , (4) 

where Vx, is the discrete gradient of the image evaluated at pixel i. Therefore, several 
papers [311 (HI [321 [331 [311 [Ml [36] have used the following convex minimization to 
reconstruct discrete tomography images: 

Xtv = argmin^ ^1 '-^ ~ Fx| |^ + /3TV(x) (5) 

where ||y — Fx|p is the £2 data fidelity term (corresponding to a Gaussian noise prior 
in the MAP setting), and the parameter /3 controls the amount of regularization. It has 
been shown [3ll [HI [321 [331 EH [351 ES] that the total variation reconstruction greatly 
improves the quality of the reconstruction of discrete images for a reduced number of 
projections. There is often a trade-off (controlled by the parameter /3) between removing 
the noise and smoothing out small features [33]. 

In this paper, we use a total-variation minimization algorithm to benchmark the 
quality of the reconstruction of the belief-propagation algorithm. Compared to Eq. 
we also constrain the solution to have values between the minimal discrete value a and 
the maximal value b. We solve 

XcTv = argmin^ -||y - Fx|p + /3TV(x) + (x), (6) 

where i[a,b]{^) is the indicator function of the interval [a,b]. In order to solve the 
minimization problem, we use proximal methods [30] that perform subgradient iterations 
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on non-differentiable terms. Several methods [HTJ ESj are suited for a sum of non- 
different iable functionals (such as i[a,b] and TV), we use here the generahzed forward- 
backward sphtting [38]. We have verified experimentally that with the additional 
interval constraint, better reconstruction results are obtained with Eq. ([6]) than with Eq. 
([5]). To our knowledge, no published paper on discrete tomographic reconstruction has 
yet used an interval constraint in addition to the total variation, although a positivity 
constraint is sometimes used for PET ^36j- For the total variation proximal operator, 
we use the second-order FISTA scheme of Beck and TebouUe [39] for the isotropic total 
variation. 

Finally, several heuristic algorithms have been proposed for discrete tomography. 
One of the simplest [ID] consists in alternating gradient iterations on the data fidelity 
term and clipping (thresholding) continuous values on the discrete labels. Batenburg [41] 
proposed a network-flow algorithm where constraints imposed by pairs of directions are 
satisfied iteratively for binary pixel values. For binary image, Batenburg also introduced 
DART [9], a variation on the gradient descent - clipping method: pixels are progressively 
labeled from the interior of objects to the more uncertain boundaries. This algorithm 
takes into account the spatial continuity of the objects as well as their binary nature. 
Along the same lines, another algorithm was recently proposed by Roux et al. [42], where 
the reconstruction is done for the continuous probability of pixels values, reducing the 
risk of clipping too early the value of a pixel. 

Compared to the existing literature on discrete tomography, we shall see that our 
belief propagation algorithm combines the full power of the Bayesian approach and the 
efficiency of iterative reconstruction algorithms. 
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3. Image reconstruction as a statistical physics problem 
3.1. The Bayesian approach 

Let us first fix the notations. The image that we want to reconstruct sits on a 
two-dimensional grid, and each of its pixels i takes a value in a g-ary alphabet: 
Xj = Xi, • • • 5 Xg- For instance, binary tomography corresponds to the case where q = 2, 
and xi = ~1) X2 = 1- We suppose that M tomographic measurements, = 1 . . .M, 
have been made. Each such measurement gives a number which is the sum of variables 
along a light ray fi: 



where z G denotes the set of all variables Xj that belong to the measurement line /x, 
and Wfj, is the additive noise on measure fi. Notice that, in a slightly more general 
setting, one would measure = J^i^^/i^t^i^i + ""^m' where F^i is a geometrical factor 
giving the fraction of pixel i that is covered by the ray corresponding to measurement 
H [l3]. This more general setting can be handled as well by our method. Furthermore, 
it does not make much difference when the image has a structure in domains, as soon 
as the typical domain size is much larger than one pixel. To keep notations simple we 
thus keep hereafter to the case where F^j = 1, the straightforward extension to more 
general cases is left to the reader. 

The problem is now to reconstruct the original image (that is the N-dimensional 
vector x) from the values of the projection (the M-dimensional vector y). 

Here, we shall adopt a Bayesian approach. As a prior information, we use the fact 
that the image to measure is not a random one, but has a structure in domains, such that 
if a pixel i has the value Xr, then the neighboring pixel has a large probability to have 
the same value. We shall enforce this bias in each direction /i in which a measurement 
has been made. This is a very natural prior for a large class of images. Following the 
Bayesian approach, our goal is thus to sample from the a posteriori distribution given 



where, following the convention in statistical physics, we have rewritten the 
normalization factor of P(x|y) as 2'(y). (In the following, we shall incorporate all 
normalization factors into Z.) Here the 6 function implements the equality = Yli&fj, 
if the measurement is perfect (w^ = 0)- the case of noisy measurement, this 5 
function should be smoothed in order to take into account our information on the noise. 
For instance, when the measurement has Gaussian noise of variance A one could use a 




(7) 



by 





function 6{x) = exp(-xV(2A))/V27rA. 
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Let us denote by p^ the a priori probability for neighboring pixels to have the same 
value along the line fi. Then 



fo(x) 



est JJ 



.5, 



where N^j^ is the number of pixels along line /i, {ij) G fi denote pairs of neighboring 
pixels along direction fi (for instance, in the case of line = 3 in Fig^H there would 
be an interaction between pixels (i) and (ii) and another one between (ii) and (iii)) and 
est is a constant that does not depend on the values of the variables, and that we shall 
thus incorporate into the normalization factor through a redefinition of ^(y). In ([9]) we 
have introduced the so-called "coupling constant" = logj^^. 

With these notations, the a posteriori probability can thus be written as a product 



of M terms, each corresponding to one tomographic projection: 

M 



^(x|y) 



In 



6 



Xi 



(10) 



The value of can be fixed based on previous knowledge of the spatial structure of the 
sample. It would also be possible to be learn its value in the course of the algorithm 
(for instance using the Expectation Maximization approach ^^). 



3.2. A message passing approach: Belief Propagation 

Sampling from the distribution in Eq. ([8]) is a notoriously difficult problem [lOl [8]. 
We shall here use an approach based on a message-passing algorithm called "Belief- 
Propagation" (BP) lini Sni HTJ SH] . It is not an exact algorithm (at least for the present 
problem), in the sense that it is not guaranteed to sample correctly from the distribution 
dH]). However, we shall see that the approximate sampling obtained from BP performs 
empirically very well. 

Let us first reformulate the problem as a graphical model [ISl SH]. We have 
variables Xi and M factor nodes, each one implementing the weight 5{y^ — 
XliG^^*)^"^''^''^'^''^'^^ - The corresponding factor graph is shown in Fig. [H 

We shall not review here the derivation of the belief-propagation algorithm, and 
refer instead to classical books and articles on the subject [171 HE]. There are essentially 
two equivalent ways to derive the BP recursion: one is to see it as a recursion that would 
be exact if the factor graph were a tree (i.e. if it had no loop). Another one is to see it 
as an Ansatz in the usual variational Bayesian inference approach that usually improves 
on the factorized "mean- field" one, leading to an expression for the log-likelihood whose 
maximization leads to the belief-propagation equations. The result is a set of message 
passing equations relating messages that go from variables to nodes and from nodes to 
variables. The message mj_s>-y from a variable z to a node 7 is the marginal probability 
of the variable Xi in absence of the constraint 7 (see Fig. [2]). On the other hand, the 




3 



Figure 1. Construction of the factor graph for the tomographic reconstruction. Each 
tomographic hne in the image (left) corresponds to a factor node in the corresponding 
graphical model (right). Each of the 1,...,M factor nodes imposes the statistical 
weight in eq. (|8]). Here, for instance, the variables denoted (i),(ii) and (iii) are all 
linked to the factor node number 3. 



variables Xi^...^Ar 




factor nodes = 1, . . . , M 

Figure 2. Message passing on the graphical model. Each variable j = 1, . . . , iV sends a 
message raj^^{xi) to all the factor nodes 7 that are connected with it, and each factor 
node ^ = 1, . . . , Af sends a message m^^i{xi) to all variables i that are connected with 
it. 
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message m^_j.j from the factor /i to the variable i is the marginal probability of the 
variable Xi when only constraint fi is present (see Fig. [2]). The message mj_>.^ is easily 
written in terms of the messages m^^i: 

mi^^{xi) oc JJ m^^i(xi) (11) 

where /i G i denotes all the factor nodes (tomographic lines) that contain variable i. 
Note that mi_^^{xi) should be normalized , in the sense that Ylix ^i^'yi^) = 1 ('^e prefer 
however not to write explicitly the normalisation factor and will therefore use the sign 
oc instead of =). The message from a factor node to a variable, on the other hand, 
reads: 

Equations ( fTTi) and (|T2il build a closed set of equations that should be solved for the 
values of the messages. Usually one seeks a solution by iterating these equations starting 
from some randomly chosen initial condition. Assuming that a fixed point of this 
iteration is reached, one obtains a set of messages that solve the BP equations. The BP 
estimate for the marginal distribution of Xj is then given by 

mi{xi) oc ]^m^^i(xi) (13) 

Let us now take a closer look at the recursion: While the iteration from variables to 
factors (in eq. (ITTl) ) is easy to compute (a simple multiplication), the one from nodes 
to variables (in eq. f|T2l) ) is unfortunately more complicated, as it involves a sum over 
all the q possible values of each variable x in the line /i. Moreover, we have to do it 
for each of the A'^ variables in this line, and for each value it can take. At first sight, 
this seems to give a complexity of order N^q^^^ for each iteration. However, one can 
compute the message in (fT2ll in a much more efficient way, namely in 0{qN^) iterations. 
Indeed, in physics language, one can recognize that estimating m^^i{xi) in Eq. f|T2|) is 
nothing else that estimating the marginal of Xi in a one-dimensional graphical model 
known in physics as the Potts ferromagnet in a random field at fixed total magnetization, 
which can be done efficiently by transfer matrix techniques, which are equivalent for this 
problem to using BP. The solution is thus to use BP in order to compute the messages 
of the initial BP problem. 

3.3. Computing messages: belief propagation within belief propagation 

Let us now focus on the following sub-problem: we want to estimate the marginal of a 
variable Xj coupled in a chain of variables, with a statistical weight given by 

P^_,i(x) (X 5{y^ -Yxj)e^^^w^^^^r-k JJ m,-^^(x,) . (14) 

There are two methods to deal with the delta function which we shall refer to (using 
the statistical physics language) as the "canonical method" and the "micro-canonical 



Ks)— O — O — O — O-^ 



Figure 3. Message-passing for the solving tlie sub-problem of estimating the marginal 
on each constraint. The left- moving message and the right-moving message should be 
computed on a one-dimensional line. 



method" . The exact solution is to use the microcanonical method where one is indeed 
summing only over the configurations of the variables that satisfy the delta function 
in Eq. f lM|) . This can be done efficiently using a transfer matrix approach, and we 
have thus implemented this method. It turns out, however, that the canonical method 
is faster. It is also a more natural approach when there exist errors or noise in the 
measurement process. We shall therefore use the latter method in the following. 

The canonical method amounts to relaxing the S{y^ — ^^j^^Xj) constraint, and 
treating it only as an approximate constraint that should hold on average, up to some 
small fiuctuations. More precisely, in the canonical method we replace the delta function 
using a Lagrange multiplier H (the "field" in physics language) that we shall fix later 
on in order to enforce the constraint on average. The statistical weight thus becomes, 
instead of ( !T4|) : 

P^j(x) oc e"'^(^'^"^je''^^^e'^''^0'=)6''''"i'"fc ]^ mj^^(xj) , . (15) 

The value of H is computed using IE'^(X]je/i ^j) = I/m' "where E-^ denotes the expectation 
value with respect to the measure . Of course, the value of H depends on both 
and i. Now, it turns out that one can compute H with an error that becomes negligible 
(in the limit where the number of variables involved in the measurement fi is large), 
by considering the complete problem obtained from (ITSl) by including all the messages 
rrij^^. This complete problem is defined by: 

P^(x) oc e"-^(^''"^ief ''^^e'^''^0'=)e''''"^-"fe JJmj-^^(xj) . (16) 

We now write the nested belief propagation equations for this complete sub-problem. 
Since the interaction is a two-body one, one can write it using a version that involves 
only one type of message, from variables to variables (see [IHl HHj and Fig. |3]). It is also 
equivalent to the so-called transfer matrix approach in statistical mechanics [H]. The 
recursion for the BP messages is easily written. In order to keep notations simple, let 
us assume that the numbering of variables is such that variable i is neighbour to z -|- 1 
along direction /i. We shall denote the messages going from right to left of the line 
as the right-moving messages rj^^^_^{xi), i = 2, . . . , and the ones going from left to 
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right as the left-moving messages ri^_^-_^_-^^{xi), i = 1, . . . , N^^ — 1 (see Fig. EJ. 'r]f^-_^{xi) 
is the marginal of Xi in the absence of the left part of the chain beyond Xj. Then one 
has (for the left-moving messages): 



L / X _ ^i^A^i) Hxj 



^ ^ [I + Vti-.^ix,)ie'- - 1)] , (17) 

where we have used that ?7j^j+i(a;j) = 1 and Z is the corresponding normalization 
factor. The recursion for the right-moving messages is similar, with the i + 1 and i — 1 
playing reverse roles. 

Once all the r]^ and rj^ are computed, one can then obtain the desired ih^^i{xi), 
taking care of the fact that the message nii^^^Xi) should not be included (see eq. [T^ . 
using 

^Hxi 

rh,^^{x,) = ^ [1 + r/f_i_.(x.)(e"" - 1)] [l + vf+,_,{x,){e'^ - 1)] , (18) 

where Z is again a normalization. 

Finally, the expected value of the variable = J^ie^j. ^« '^^^ t)e computed using the 
average value of Xj in presence of the random field (and this time therefore including 
the term mj_^^) given by 

H^i) = ^ - — ' (^^^ 

from which we can compute the expected value of the sum of the variables M{H) along 
the line n: 

M{H) = E I 5^ X, J = 5^ E(x,) . (20) 

A standard result of statistical physics is that M{H) is a monotonous function of H. 
This is a consequence of the fact that there are no phase transitions in a one-dimensional 
Potts model [SOI EI]. This means that one can apply safely a dichotomy algorithm to 
find the value of H such that = M{H). 

With the canonical method, one cannot fix exactly ^jg^Xj but rather its 
expectation, and this means that we are allowing fiuctuations (typically of order 0{^Jy^)) 
around the strict tomographic constraint. Of course, one might also want to enforce 
strictly this constraint, in which case one has to resort to the microcanonical method. 

Given these equations, the complete marginals for each variable can be computed 
using Eq. ( fT3l) . The marginal mj(xi) denotes the probability that variable i is assigned 
the value x^; and the most likely labeling is thus given, at each step of the algorithm, 
by 

X* = argmaXj,.mi(xi). (21) 

After convergence of the algorithm, the values x* thus represent the segmentation into 
discrete labels that is looked for. 
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3.4- A special case: binary tomography 

We shall now specialize for the rest of this paper to binary tomography, that is, the 
case g = 2, where the variables take values Xi = —1,1 (after a suitable rescaling of 
experimental data) [||. In physics terminology, this amounts to the Ising model, and 
we shall sometimes use the term "spins" for pixels. In this case, it is convenient to 
rewrite the expressions in a simpler way. Indeed, for binary variables x one can write a 
probability using a real variable h (which we will call the field) via: 

P(x) = -—— . (22) 
^ ^ 2 cosh /i ^ ^ 

Therefore, we define the fields hi^^ and h^^i by 

^--^^^'^ = 2cosh/..^/ ^2') 

g ^ — yi^i 

m^^,{xi) = . (24) 

2 cosh hfj^^i 

Using this notation, we can rewrite the message from a variable to a node eq. fill I) 
as follows: 

hi^j = ^ ^ ^/i— >i • (25) 
The message from a node to a variable, on the other hand, reads: 

The marginal m^j_^i[xi) thus corresponds to the one of a one-dimensional random field 
Ising model, where the external fields are given by hj^fj,, and constrained to have a 
total magnetization y^. The recursion "within a line" (corresponding to Eq. (fTTIl in 

the general case) can also be written in terms of fields, using 77^ (x) = 2 coshl-^ 
V'^(^) = I^ETP?' we have 

— atanh [tanh tanh {H + hi^^ 



atanh [tanh tanh {H + hi^^ + n^^^^J] . (29) 
Again, once these are computed, the marginal can be expressed as 

m^^i{xi) = , (30) 

2 cosh n^^i 

K^i = uti^i + uf^^^, + H . (31) 



I Even if the exact values of the discrete levels are not known, a method for estimating the discrete 
values has been proposed by Batenburg et al. [52] 
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Finally, the expected value of the variable = J2iet^ ^^^^ needed to fix the correct 
value of H, is given by summing the average value of Xi (with all random fields), each 
of them given by 

E(xi) = tanh [hi^f, + hf,^i) , (32) 

which is the analog of eq. ( fT9l) for the binary case. Finally, the most likely assignment 
for the values Xi at each step of the algorithm is given by 



X* = sign I tanh ^ h^^^ j = sign | ^ h^^, j . (33) 
3. 5. Implementation of the binary algorithm 

We shall now summarize the complete algorithm for the binary case. Let us start by a 
few comments about the practical implementation. 

• An efficient way to initialize the fields is to compute their equilibrium value 
for J = 0: 

hf,^i = atanh {y^/N^) , (34) 

where A'^ is the number of variables on the line /i. This initialization is also used 
in |12]. 

• Along a light ray slanted with respect to the pixel grid, the distance between 
successive spins can be larger than 1. In this case, we use a couphng between 
the spins that decreases when the distance between neighbors increases. We use 

= atanh (tanh'^ j) where D is the ii distance (Manhattan or taxi-cab distance) 
between successive variables on the line This choice, suggested by the so-called 
high-temperature approximation [53], mimics the effective correlation between two 
variables at distance D in the two-dimensional Ising model. 

• We have observed that the results x* depend only weakly on the value of J, as long 
as J = 0(1). Hence, we did not optimize the value of J: all the results presented 
here were obtained with J = 0.2. 

• In the case of noise-free measures, we stop the algorithm when all line-sums 
constraints are satisfied by the x*. For noisy measures, we stop the algorithm 
when the number of fiipping spins saturates; this criterion is discussed in Sec. O 

• We have observed empirically that damping the update of the fields is sometimes 
necessary for a proper convergence of the algorithm. We thus replaced eq. ( 13T1) by 

h';^, = sh'^^, + (1 - s) + + H) . (35) 

In our implementation, we have fixed empirically the damping value s to 1 — 1.6/ng. 
This scaling was found to ensure convergence for all the values of ng and A^ tested. 

• In order to avoid overfiow or underfiow problems, we clip the value of the different 
fields inside an interval [—B,B]. We set B to 400, a value large enough for the 
result to be independent of B. 
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• In order to find the value of tlie external field H, a few dichotomy iterations are 
enough to find if up to a good precision e (we use e = 0.05). In order to speed up 
the computation of H, we use Newton's method when the total magnetization is 
close enough to y^. 

BP-TOMO(2/^, criterion, 

1 Initialize the messages h as described above. 

2 while conv > criterion and t < tmax: 



3 do 

4 t^t + 1. 

5 for each fi;i^fi: 

6 do Update hi^fj_ according to eq. (!25|) . 

7 for each ft^iEfi: 

8 do 

9 while (jy^-j:^^^E{xi)\<e) 

10 do Compute the messages and according to eq. f l28H29|) . 

11 Compute the new value of J2ufj.'^ update H (e.g. by dichotomy). 

12 Update the h^^i according to eq. fl35|l . 

13 do Compute the new values x* from eq.( l33|) and check for convergence. 



14 return signal components x*. 

Note that this algorithm is of the embarrassingly parallel type, since each line n 
can be handled separately when computing line 7 in the above pseudo-code. This 
could be used to speed up the algorithm by solving each line separately on different 
cores. A Python code (with some C extensions) of the current version is available on 
https://github.coin/eddain/bp-for-tomo, under a free-software BSD license. 
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Figure 4. Synthetic data generated for p = 14, 28, 38, and a size 256 x 256. 

4. Numerical test in the zero-noise case 
4.I. Generation of synthetic data 

In order to measure the performance of our algorithm, we generate synthetic images 
with different levels of sparsity of the gradients, but a similar binary microstructure. 
We obtain such a set of images with a controlled level of gradient sparsity as follows: 
p"^ pixels are picked at random inside the image and set to 1 while the background is 
set to 0, then a Gaussian filter of width proportional to 1/p is applied to the image. 
The resulting continuous-valued image is finally thresholded at the mean intensity to 
obtain a binary image. Examples of binary images with different values of p are shown 
in Fig. m A simple measure of the sparsity of our images is given by 

P(^) = (36) 

where i{x) is the length of the internal boundary of one of the two phases in the image. 
The internal boundary is the set of pixels that have within their 4-connected neighbours 
at least one pixel belonging to the other phase. [§| p therefore measures the fraction of 
pixels lying on the boundary. We choose this measure since the original image can be 
retrieved from the knowledge of the internal boundary (and the value of a single pixel). It 
would also have been possible to choose the set of pixels for which the finite-differences 
gradient of the image is non-zero. However, the latter set of pixels is redundant for 
retrieving the image, and leads to a slight over-estimation of the sparsity of the image. 
Using the definition of Eq. fl36|l , we find that p scales linearly with p. 

4-2. Reconstruction in the absence of noise 

Let us first investigate the effect of the number of angles on the reconstruction of a given 
image, when no noise is added to the projections: y = Fx. We define the undersampling 

§ The internal boundary can be computed by simple mathematical morphology operations |54| , as the 
difference between the origin image and its morphological erosion. Morphological erosion is one of the 
two fundamental operations in mathematical morphology [54], realizing the intuitive idea of eroding 
binary sets from their boundaries. 
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Figure 5. (a) Decay of the number of errors w.r.t. ground truth vs. the number 
of iterations of BP-tomo, for p = 15, L ~ 256 and a number of angles varying from 
66 (blue) to 20 (red). The same value of J = 0.2 was used for all cases. Note how 
the error reduction becomes slower when the number of projections is decreased. For 
smaller number of projections (not shown here) , the fraction of errors saturates at an 
important level, (b) Number of iterations required to reach an exact reconstruction 
versus undersampling rate a, for p = 22 and J = 0.2. The number of iterations 
diverges when the transition between exact reconstruction and faulty reconstruction is 
approached, (c) Critical undersampling rate ac vs. boundary density p. 
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rate by the ratio between the number of measures and the number of variables: 

" = f ■ (37) 

Since behef propagation is not an exact algorithm, it is not guaranteed to reconstruct 
the exact original image. Nevertheless, we find experimentally that for a sufficient 
number of measures, the algorithm always reconstructs the exact image after a finite 
number of iterations. Once all spins have reached the correct orientation, we observe 
that their orientation does not change any more. For a given image, the evolution of the 
discrepancy between the segmentation of the magnetization (Eq. fl33|) ) and the ground 
truth is plotted in Fig. |5] (a) against the number of iterations, for different measurement 
rates a. The number of errors decays roughly exponentially with time, but the decay 
rate increases with the number of measurements: the convergence towards the exact 
image is faster when more measurements are available. Also, a sharp transition is 
observed for a critical a, under which the number of errors does not reach zero, and 
eventually increases again at long times. The transition between the two regimes is hard 
to estimate accurately, since the convergence time seems to diverges when the transition 
is approached, as shown in Fig. [5] (b). We estimate the critical undersampling rate 
from the lowest number of angles for which exact reconstruction is reached before 400 
iterations. 

We have measured the critical undersampling rate etc for images with different sizes 
of the microstructure, hence different levels of sparsity. The critical undersampling rate 
is plotted against the boundary density p (Eq. (l36l) ) in Fig. [5] (c). We observe a linear 
relationship 



This simple relationship corresponds to a very good recovery performance, since pN can 
be seen as the number of unknowns needed to retrieve the image. Therefore, we only 
need a number of measurements M = aN comparable to the number of unknowns for 
an exact reconstruction of the image. 



Oic ^ p- 
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5. Numerical test in the noisy case 

Now that we have estabhshed the phase diagram of our algorithm, we wish to assess 
its performance for noisy measurements. Different sources of noise may corrupt the 
measurements in X-ray tomography [55]. Here we add a Gaussian white noise of fixed 
amplitude to the projections. We define the measure noise to signal ratio (NSR) by 

NSR = y, (39) 

where a is the standard deviation of the additive noise. 

In the noisy case, it is not possible any more to stop the algorithm when all 
constraints are satisfied. Nevertheless, for a small value of the noise the algorithm 
still converges to an exact reconstruction (Fig. |6]). When there is convergence to the 
exact solution, we observe that this solution is stable: all spins keep their orientation 
during further iterations of the algorithm. Convergence is reached after a number of 
iterations similar to the noise-free case. For a larger amplitude of the noise, a finite 
fraction of errors remains. We observe that the fraction of errors first decreases and 
reaches a minimum, then starts to slowly increase again. A good choice of the stop 
criterion is therefore important in order to optimize the quality of the reconstruction. 
Empirically, we found that the best number of iterations correlates well with the number 
of iterations needed to reach exact reconstruction in the small- noise case. It can also be 
detected by a change of slope in the decay of spins flipped between successive iterations 
(see the inset in Fig. |6]). 

5.1. Robustness to noise 

In Fig. [7l we have plotted the reconstruction error for the same image against the 
measure noise to signal ratio, for two values of the measurement rate a = 1/4 and 
a = 1/10. An image with p = lA (Fig. H] left) was used, and J = 0.2 was used for 
all values of the NSR. We have also computed the reconstruction error for the convex 
minimization of Eq. (|6]). For a fair comparison, the best value of /3 minimizing the 
reconstruction error was computed using Brent's method (whereas no parameter was 
optimized for BP-tomo). We observe that the reconstruction quality is better for BP- 
tomo than for the convex algorithm, up to a threshold above which the error is greater 
for BP-tomo. Nevertheless, the values of the error reached at the threshold are too 
high for a satisfying reconstruction for several applications. Interestingly, there is a first 
regime for small noise, for which the fraction of errors is zero or very small for BP-tomo, 
while the error increases much faster for the convex algorithm. 

Fig. [8] shows that the reconstruction errors are mostly located on the boundaries 
between the two phases for the two algorithms (except for a few isolated errors for BP- 
tomo), but that a larger fraction of the interfaces are correctly reconstructed for BP- 
tomo. Fig. [8] also displays the continuous magnetization, as well as the non-segmented 
minimization of the convex algorithm. We see that the contours of the objects are 
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Figure 6. Stop criterion in the noisy case - Main axes: fraction of error compared 
to ground truth vs. number of iterations n for a signal to noise ratio of 0.6 % (•), 1 % 
(t) and 2 % (*). For a small amplitude of the noise, the number of errors converges to 
zero (SNR = 0.6%). For a large amplitude of the noise, the fraction of errors reaches a 
minimum after the number of iterations needed to reach an exact reconstruction for a 
smaller value of the noise (here, n = 15). Inset: number of flipped spins between two 
iterations vs. number of iterations. We observe that the minimum fraction or errors 
correlates well with a change of slope for the number of flipped spins. 
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Figure 7. Fraction or error Ne/L"^ vs. normalized measure noise a/L, for BP-tomo 
and the convex algorithm (TV). These numerical tests have been performed for two 
different undersampling rates a. For a small noise, BP-tomo always outperforms the 
convex algorithm. In particular, there is a finite interval of noise intensity for which 
the reconstruction is error-free. 
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Figure 8. (a) BP-tomo, reconstructions for ct/L = 0.006, 0.01, 0.02, 0.03 (from 
left to right). The top row shows the magnetization of the pixels, and the bottom 
row the segmentation, with segmentation errors contoured in blue (resp. magenta) 
for wrong pixels in the a; = — 1 (resp. x = 1) phase. Note how the absolute 
value of the magnetization decreases when the measure noise increases, revealing 
a greater uncertainty. (b) TV (convex algorithm), reconstructions for x/L = 
0.006, 0.01, 0.02, 0.03 (from left to right). The top row displays the result of the convex 
optimization, and the bottom row its segmentation. Boundaries between domains are 
not as sharp as for BP-tomo, resulting in more segmentation errors close to boundaries. 
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Figure 9. Fraction or error Ne/ as a function of the measurement rate a, for BP- 
tomo and the convex algorithm (TV) (inset in log- log coordinates). These numerical 
tests have been performed for a small noise ajh ~ 0.002. For both algorithms, the 
number of errors decrease when the measurement rate is increased, but the evolution 
is very different. For BP-tomo, the evolution is very sharp: above a certain threshold, 
the reconstruction is error-free. However, below this threshold the number of errors 
displays a sharp transition and converges quickly to the value for a random image (50 
% of errors). The convex algorithm fares better below the BP-tomo threshold, but does 
not have a "phase transition", so that it is outperformed by BP above the threshold. 

delineated more accurately for BP-tomo than for the convex algorithm, especially for 
objects with concavities. 

For a given value of the NSR, we observe in Fig. [7] that the reconstruction error 
increases when the measurement rate decreases. The effect of the measurement rate in 
the noisy case is studied in the next paragraph. 

5.2. Influence of the number of measures 

For the same image (p = 14, J = 0.2), we have fixed the SNR to 0.002 and computed 
the reconstruction error as a function as the measurement rate, for the two algorithms. 
Results are shown in Fig. [91 For BP-tomo, we observe a sharp transition between an 
exact reconstruction to a failed reconstruction when the measurement rate is decreased 
(a score of 0.5 would be obtained when labeling pixels at random). Therefore, the 
noisy case displays a transition between failure and success as in the noise-free case 
(Fig. [5l). The transition is observed at the same measurement rate as the noise-free 
case. Approaching the transition from above, a finite error fraction appears (see the 
log- log inset in Fig. |9]). In the noise- free case, exact reconstruction could be reached 
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Figure 10. Fraction or error Nel as a function of the measurement rate a, for BP- 
tomo and two different values of the noise amplitude (inset in log-log coordinates) . The 
transition is sharper for a small noise. In the inset, the dashed vertical line represents 
the empirical value of the noise-free critical undersampling rate. 

until the transition, but an increasing number of iterations was needed when approaching 
the transition (Fig. |5] (b)). 

In contrast, the evolution of the error fraction when decreasing the measurement 
rate is much smoother for the convex algorithm, and does not display a phase transition 
in Fig. IHl As a result, the error fraction is larger for the convex algorithm above the 
BP-tomo transition. At low measurement rates, the error fraction is smaller for the 
convex algorithm. However, the crossover corresponds to values of the error fraction 
that are not acceptable for most applications. 

For a larger value of the noise. Fig. [10] shows that a transition between good and 
failed reconstruction is still observed. However, the fraction of errors is larger, therefore 
the transition is not as sharp as for smaller NSR. Comparing Fig. |9] and Fig. [TOl we 
conclude that the reconstruction quality is better above the transition for a SNR of 0.01 
than for the convex algorithm for a SNR five times smaller. 
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6. Discussions and perspectives 

The belief-propagation algorithm presented in this article is found to have excellent 
recovery properties for tomographic binary reconstruction. Indeed, in the noise-free case 
the exact original image can be reconstructed from a number of tomographic measures 
approximately equal to the number of pixels lying on the internal boundary of the 
objects, that can be viewed as the number of unknowns in our problem. We are not aware 
of other studies on discrete tomography where the empirical computation of the critical 
undersampling rate as a function of the data sparsity was performed. In Batenburg's 
DART algorithm [9] , a sharp transition between exact and faulty reconstruction is also 
observable for the different kinds of phantoms studied, but the relation between the 
complexity of the data and the critical undersampling rate was not described. Our 
search for recovery bounds is inspired by empirical and theoretical results obtained in 
the field of compressed sensing about phase transitions in recovery [561 IS3 EHl 159] . 
Obtaining theoretical results for the discrete tomography problem is more difficult 
than for signals sparse in a known basis - our measure of image "sparsity" comes 
from set theory and mathematical morphology, and applies only to discrete images. 
Nevertheless, it is possible to obtain empirical bounds as we did in Section |H For a 
more complete phase diagram of our algorithm, one should also investigate corrections 
depending on the size of the data A^, that appear for example in the io — ii Donoho- 
Tanner phase transition [56]. A probability of exact recovery could also be obtained 
by computing the critical undersampling rate for a large number of images sharing the 
same sparsity [561 EZ] • 

When Gaussian noise corrupts the measurements, we observe that an exact 
reconstruction can still be retrieved for small noise and a sufficient number of 
measurements. For a fixed non-zero level of noise, the fraction of segmentation errors 
grows when the number of measurements is decreased, but the fraction of errors is 
smaller than the value of the noise over signal ratio for a large window of undersampling 
rates. At the noise-free critical undersampling rate, the fraction of errors grows very fast 
when the number of measurements is decreased further. Above the noise-free critical 
undersampling rate and for moderate noise, we find that our algorithm systematically 
outperforms a convex algorithm implementing the convex relaxation of the binary 
tomography problem. The value of the noise corresponding to the crossover when the 
convex algorithm becomes better increases when the undersampling rates increases (i.e., 
when one moves away from the transition). At the crossover, the reconstruction error 
is high - typically, most pixels lying on objects boundaries are wrong -, making the 
reconstruction unsuitable for further processing or interpretation in many cases. We 
therefore conclude that the belief-propagation algorithm is better in most cases where 
further processing of the images needs to be done. 

For practical applications, if the measurement noise is known and the 
undersampling rate can be selected, we suggest that the operator should select the 
undersampling rate to allow for a comfortable margin with respect to the critical noise- 
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free undersampling rate. This will speed up the convergence rate (see Fig. \5\ (b)) and 
reduce significantly the measurement error when the noise is important (compare the 
plots in Fig. [7] for a = 1/4 and a = 1/10 for 1% of noise and more). In other words, 
trying to approach the limit — p is more interesting for theoretical than practical 
reasons. However, the good news is that one does not need to go very far from the 
transition for an efficient and accurate reconstruction, all the more for a small noise 
amplitude. A systematic evaluation of the reconstruction error as a function of a, p and 
the NSR is out of the scope of this article, but would be of interest for applications. 

In future work, the recovery properties of our algorithm should be tested on real 
experimental data. In order to reconstruct large 3-D volumes in a reasonable time, 
our actual implementation of the algorithm can be improved on the numerical side - 
for example with an implementation coded on the graphics processing unit (GPU) - 
and probably also on the algorithmic side. For example, a different update scheme 
for the message passing might accelerate the convergence of the algorithm. For a 
better precision when the density of boundaries in the image is high (large p), the 
geometrical factors F^j (see Section [3]) should be computed, as they are in usual algebraic 
reconstruction algorithms. For 3-D images, the additional knowledge that neighboring 
horizontal slices are likely to have the same pixel values can be easily implemented in 
the algorithm as well. This should lead to an even better recovery compared to the 2-D 
case. 

We also plan to test the recovery properties in the multilabel case (g > 2). When q 
becomes large, it remains an open question to know how the discrete algorithm performs 
compared to the convex continuous algorithm. 

Finally, for in-situ tomography [6] the sample is rotating continuously to speed-up 
the acquisition rate, in the case of fast transformations. It would therefore be interesting 
to adapt our algorithm to the case of continuous acquisition, when one image taken on 
the detector integrates the projection of the sample over a finite angular sector. 
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